Blockwise float8 quantizer and quantized tensor class #1513

kwyss-nvidia · 2025-02-27T00:28:04Z

Description

Adds pytorch and C++ quantizer and quantized tensor classes for a subchannel quantization scheme.

The classes are configurable for 128x128 blocksize and 1x128 blocksize via setting block_scaling_dim == 2,1 respectively.

Scale tensors are stored in a format emenable for matrix multiplication, however the integration of matmul is deferred as a separate story.

Fusions of quantization and DBIAS or activation functions are not yet implemented, and the dequantization is currently implemented in torch.

Tests for quantization are included in C++ and pytorch layers, with exact comparison to reference quantizer behavior as well as an attempt to hit interesting branches through the API such as tensor creation in pytorch and CPP and dequantization of row and columnwise usage.

Two CUDA kernels for quantization are included.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Pytorch/C++ Quantizer class
Pytorch/C++ Quantized Tensor class
Quantization CUDA kernels for 1x128 and 128x128 block size.
C++ testing of nvte_quantize API
python testing of quantization via tex.quantize

Checklist that can arguably can be deferred for a future MR:

Tasks that have a dependency on a GEMM and are not included.

GEMM implementation in general_gemm
Recipe Setup
Layer-wise numerical testing
Distributed numerical testing

Test Instructions

Python tests:

pytest tests/pytorch/test_float8blockwisetensor.py
pytest tests/pytorch/test_float8_blockwise_scaling_exact.py

C++ tests:

TE_PATH=<where_is_TE>/ bash qa/L0_cppunittest/test.sh
# Wait for the build to complete.
# To run specific tests
./tests/cpp/build/operator/test_operator --gtest_filter='*FusedCastFloat8*wiseTestSuite*'

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

The classes are configurable for 128x128 blocksize and 1x128 blocksize via setting block_scaling_dim == 2,1 respectively. Scale tensors are stored in a format emenable for matrix multiplication, however the integration of matmul is deferred as a separate story. Fusions of quantization and DBIAS or activation functions are not yet implemented, and the dequantization is currently implemented in torch. Tests for quantization are included in C++ and pytorch layers, with exact comparison to reference quantizer behavior as well as an attempt to hit interesting branches through the API such as tensor creation in pytorch and CPP and dequantization of row and columnwise usage. Two CUDA kernels for quantization are included, and are direct ports of equivalents in the kitchen repository, where a subchannel recipe has been used for end to end training.

for more information, see https://pre-commit.ci

zhongbozhu · 2025-02-27T00:36:35Z

Great to see this PR!

Can you leave some description about how to run your unit tests? Thank you.

ptrendx · 2025-02-27T00:46:53Z

transformer_engine/pytorch/tensor/_internal/mxfp8_tensor_base.py

@@ -96,7 +96,7 @@ def prepare_for_saving(self) -> Tuple[list[Optional[torch.Tensor]], MXFP8TensorB
        """Prepare the tensor base for saving for backward

        After calling this, the tensor instance does not hold any
-        data.
+        data. Yes it does? TODO


It is being fixed in #1500.

Thanks. I'll track that for an example of a good pattern to follow for Float8BlockwiseQTensorBase

kwyss-nvidia and others added 2 commits February 26, 2025 15:05

[pre-commit.ci] auto fixes from pre-commit.com hooks

e6c8c77

for more information, see https://pre-commit.ci

ptrendx reviewed Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blockwise float8 quantizer and quantized tensor class #1513

Blockwise float8 quantizer and quantized tensor class #1513

kwyss-nvidia commented Feb 27, 2025 •

edited

Loading

zhongbozhu commented Feb 27, 2025

ptrendx Feb 27, 2025

kwyss-nvidia Feb 27, 2025

Blockwise float8 quantizer and quantized tensor class #1513

Are you sure you want to change the base?

Blockwise float8 quantizer and quantized tensor class #1513

Conversation

kwyss-nvidia commented Feb 27, 2025 • edited Loading

Description

Type of change

Changes

Test Instructions

Checklist:

zhongbozhu commented Feb 27, 2025

ptrendx Feb 27, 2025

Choose a reason for hiding this comment

kwyss-nvidia Feb 27, 2025

Choose a reason for hiding this comment

kwyss-nvidia commented Feb 27, 2025 •

edited

Loading